pyngoST: fast, simultaneous and accurate multiple sequence typing of Neisseria gonorrhoeae genome collections

Extensive gonococcal surveillance has been performed using molecular typing at global, regional, national and local levels. The three main genotyping schemes for this pathogen, multi-locus sequence typing (MLST), Neisseria gonorrhoeae multi-antigen sequence typing (NG-MAST) and N. gonorrhoeae sequence typing for antimicrobial resistance (NG-STAR), allow inter-laboratory and inter-study comparability and reproducibility and provide an approximation to the gonococcal population structure. With whole-genome sequencing (WGS), we obtain a substantially higher and more accurate discrimination between strains compared to previous molecular typing schemes. However, WGS remains unavailable or not affordable in many laboratories, and thus bioinformatic tools that allow the integration of data among laboratories with and without access to WGS are imperative for a joint effort to increase our understanding of global pathogen threats. Here, we present pyngoST, a command-line Python tool for fast, simultaneous and accurate sequence typing of N. gonorrhoeae from WGS assemblies. pyngoST integrates MLST, NG-MAST and NG-STAR, and can also designate NG-STAR clonal complexes, NG-MAST genogroups and penA mosaicism, facilitating multiple sequence typing from large WGS assembly collections. Exact and closest matches for existing alleles and sequence types are reported. The implementation of a fast multi-pattern searching algorithm allows pyngoST to be rapid and report results on 500 WGS assemblies in under 1 min. The mapping of typing results on a core genome tree of 2375 gonococcal genomes revealed that NG-STAR is the scheme that best represents the population structure of this pathogen, emphasizing the role of antimicrobial use and antimicrobial resistance as a driver of gonococcal evolution. This article contains data hosted by Microreact.


INTRODUCTION
Molecular typing of bacterial species revolutionized epidemiological surveillance of pathogens in the late 20th century, especially after the development of DNA-based techniques such as PCR amplification.In the 1990s, multi-locus sequence typing (MLST) [4] emerged as a highly standardized molecular typing method that involved the PCR amplification and traditional Sanger DNA sequencing of fragments of several housekeeping genes to define unique allelic profiles for bacterial strains.The different allelic profiles are subsequently designated as divergent sequence types (STs).This methodology greatly enhanced our understanding of bacterial pathogen epidemiology, antimicrobial resistance (AMR) and surveillance [5].
For the sexually transmitted human pathogen Neisseria gonorrhoeae, three different sequence-based typing schemes have been used most frequently (Table 1): Neisseria spp.MLST [6], N. gonorrhoeae multi-antigen sequence typing (NG-MAST) [7], and N. gonorrhoeae sequence typing for antimicrobial resistance (NG-STAR) [8].The Neisseria spp.MLST scheme was developed to characterize molecular lineages of N. gonorrhoeae, Neisseria meningitidis and Neisseria lactamica and examines fragments of seven slowly evolving housekeeping genes (abcZ, adk, aroE, fumC, gdh, pdhC and pgm) [6].In contrast, NG-MAST examines fragments of two highly variable genes, porB and tbpB, encoding the PorB (Porin B) and TbpB (Transferrin-binding protein B) outer membrane proteins and it was developed to enhance strain discrimination [7].Due to the limitation of examining only two highly variable loci, often including recombination events, NG-MAST is typically not representative of the population structure of N. gonorrhoeae, but can be grouped into genogroups following a set of similarity rules between the alleles of different isolates to improve the representation of its population structure [9,10].However, NG-MAST genogroups are dataset-specific, ST-genogroup relationships are not stable over time and, thus, genogroups are not comparable between studies or time points [1,9].Alternative methods are recommended, such as NG-STAR, for a more robust and transferable strain characterization that also reflects AMR.

Impact Statement
Molecular typing has been key for Neisseria gonorrhoeae epidemiological and antimicrobial resistance (AMR) surveillance, and whole genome sequencing (WGS) has revolutionized this typing.The most frequently used molecular typing schemes include MLST, NG-MAST, NG-STAR, and modifications of those.These schemes can be extracted from WGS assemblies for comparability and reproducibility of results with laboratories that do not have access to WGS technologies.pyngoST is a unique command-line Python tool that integrates all these common typing schemes under the same framework and performs rapid simultaneous user-defined multiple typing of large numbers of gonococcal genomes through a fast multi-pattern searching algorithm.Typing results on 2375 gonococcal genomes revealed that NG-STAR best represents the genomic population structure of N. gonorrhoeae, highlighting the importance of antimicrobial use and AMR on the evolution of this pathogen.Monitoring AMR gonococcal lineages has been stated as crucial to combat AMR gonococci [1,9,11].The NG-STAR scheme was developed as a standardized AMR typing tool that combined strain typing with AMR profiling [8].NG-STAR examines sequence fragments of seven AMR determinants associated with resistance or decreased susceptibility to antibiotics, such as penicillin, ciprofloxacin, the extended-spectrum cephalosporins (ESCs) cefixime or ceftriaxone, and azithromycin: penA, mtrR, porB, ponA, gyrA, parC and 23S rRNA.This typing scheme allows monitoring the emergence and spread of lineages with different degrees of antimicrobial susceptibility or AMR.NG-STAR STs can be grouped into clonal complexes (CC) for a more consistent fit with the N. gonorrhoeae population structure, simplifying nomenclature and providing reproducibility into AMR lineage definition [12].
Despite MLST being a breakthrough for pathogen epidemiology, the real revolution arrived with whole genome sequencing (WGS) [13,14], which provided a comprehensive view of genomes, enabling high-resolution strain typing, phylogenetic analysis, and detailed characterization of genetic variations and AMR determinants.The progressive decrease in the cost of high-throughput short-read sequencing has made WGS a cost-effective tool compared to performing multiple PCRs followed by Sanger sequencing for strain typing.However, molecular typing tools are still in use in public health laboratories because they are used for simplified naming of strains, clones or clades and they are included in standardized protocols to ensure inter-laboratory reproducibility and quality of the results.The standardization of WGS-based protocols for epidemiological surveillance of pathogens is still undergoing, as it requires specialized infrastructure and personnel training [15].
Traditionally, allele sequences and allelic profiles are manually submitted to web servers, revised and assigned by data curators.With the appropriate bioinformatic tools, molecular typing alleles can be extracted from bacterial genomes.To facilitate molecular typing from collections of genome sequences, command-line tools have been developed, such as mlst [16], SRST2 [17], ARIBA [18] or NGMASTER, which is specific for the NG-MAST scheme [19].The rules to calculate NG-MAST genogroups have been reported in previous studies [9,10], but there is still no publicly available tool that implements them.Command-line tools that automatically perform large-scale NG-STAR or NG-STAR CC typing are lacking, with the exception of the R tool WADE [20] that is able to extract MLST, NG-MAST and NG-STAR typing from genomes as an R Shiny application or a standalone tool.Open access online platforms, such as Pathogenwatch [21] or PubMLST [22], can also extract typing information from genome assemblies for several species, including MLST, NG-MAST and NG-STAR for N. gonorrhoeae.However, these tools only retrieve the canonical typing schemes and the analysis of large numbers of genomes in these servers have inherent limitations that compromise speed.As a benefit, command-line tools can be modularly integrated into local bioinformatics pipelines with very different focuses.
Here, we present pyngoST, a Python command-line tool that unifies molecular typing of N. gonorrhoeae from genome assemblies.pyngoST implements the Aho-Corasick (AC) multi-string searching algorithm for an ultrafast simultaneous screening of all loci included in the MLST, NG-MAST and NG-STAR schemes on the input assembly files.pyngoST can also output NG-STAR CCs and information of penA mosaics, and implements the calculation of NG-MAST genogroups.We also provide an evaluation of the concordance of these typing schemes with the gonococcal population structure obtained from WGS.

Implementation of pyngoST
pyngoST has been modularly implemented in Python 3.8 under a GNU GPLv3 licence and is available from Github (https://github.com/leosanbu/pyngoST).Installation can be performed from the source code or from Pypi using pip.pyngoST is written as an open-source tool containing a main script pyngoST.py, where the principal functions are contained in and called from pyngoST_ utils.py.The main dependencies include the pyahocorasick module [23] and Biopython [24].pyahocorasick is a fast and memoryefficient library written in C that contains the ahocorasick Python module for multi-pattern string searching via the AC algorithm.
Use of pyngoST requires the building of a local database, from which three different modules can be run (Fig. 1): (1) molecular typing of N. gonorrhoeae from assemblies in fasta format by any of MLST, NG-MAST, NG-MAST genogroups, NG-STAR and NG-STAR CCs, including the type of penA mosaicism (mosaic, semi-mosaic or non-mosaic) according to https://ngstar.canada.ca; (2) sequence typing from multiple allelic profiles of any of MLST, NG-MAST, NG-STAR and NG-STAR CCs contained on a tabular or comma-separated table, that can also include the type of penA mosaicism; and (3) classification of NG-STAR STs into NG-STAR CCs from a tabular or comma-separated table .NG-MAST genogroups are calculated following previously described rules [1,9,10]: NG-MAST profiles are assigned into a genogroup if one identical allele is shared and the other allele shows ≥99 % similarity (≤5 bp difference for porB and ≤4 bp difference for tbpB) or the concatenated sequence of both alleles shows ≥99.4 % similarity to the concatenated sequence of both alleles of the main ST in the genogroup.
Novel allele sequences of any of the three main schemes (MLST, NG-MAST and NG-STAR) can be optionally assigned to the closest match in the database by using a local installation of BLASTn [25].These new sequences can be extracted from the assemblies and saved as fasta files so the user can submit them to the corresponding web server for assignment by a data curator.
Large data collections can benefit from this task through the implementation of a ThreadPoolExecutor via the concurrent.futuresPython module [26].

Database building
Use of pyngoST relies on the building of a local database, which can be downloaded or updated using the -d [calls the down-load_db() function] and -u [calls the update_db() function] options, respectively.MLST (PubMLST scheme ID=1, seven loci) and NG-MAST (PubMLST scheme ID=71, NG-MAST v2, two loci) allele sequences and ST profiles are downloaded from PubMLST (https://pubmlst.org[22]), while NG-STAR allele sequences of the seven loci, type of penA mosaicism (mosaic, semi-mosaic Fig. 1.Structure of the pyngoST algorithm.From a database of allele sequences and sequence type (ST) profiles, pyngoST performs simultaneous molecular typing of Neisseria gonorrhoeae assemblies using the multi-pattern string searching Aho-Corasick algorithm through three schemes: N. gonorrhoeae sequence typing for antimicrobial resistance (NG-STAR) [8], multi-locus sequence typing (MLST) [6] and N. gonorrhoeae multi-antigen sequence typing (NG-MAST) [7] (Table 1).NG-STAR clonal complexes (CCs) [12], NG-MAST genogroups [9,10] and penA mosaicism can also be reported.Closest matches of new alleles to existing sequences can be computed using blast [25] and these can be saved for manual submission by the user to the corresponding web servers.pyngoST can also be used to only obtain NG-STAR CCs from NG-STAR STs according to the database as well as to report STs from a table of allelic profiles of any of the MLST, NG-MAST and NG-STAR schemes.
or non-mosaic) and the assigned ST are downloaded from https://ngstar.canada.ca[8] (Table 1).Allele sequences and ST profiles are stored in python dictionaries.These are used to create an AC automaton [make_ACautomaton() function] using the pyahocorasick module, which is encapsulated into a pickle file that is loaded in subsequent pyngoST runs to ensure speed in database loading.As a reference, a database downloaded in August 2023 contained 44 963 STs and 26 686 alleles (Table 1).As draft assemblies are often used, the reverse complementary sequence of each allele is obtained and stored with the database, which contained 53 372 sequences.
Note that the web server of the original NG-MAST scheme [7] is no longer available and that a refined and optimized version, NG-MAST v2, curated and maintained in PubMLST [22], is used by default by pyngoST.This improved version performs an additional curation of porB and tbpB sequences to account for the hyper-variability found in this scheme.However, older versions of the latest available NG-MAST database could be manually included by the user instead of this optimized version.This is important as it can lead to some inconsistencies when comparing NG-MAST v2 results with the original NG-MAST results from previous studies.
NG-STAR CCs are obtained by applying a full goeBurst algorithm on an up-to-date NG-STAR database and grouping those as previously described [12].NG-STAR CCs are integrated into the database at the time of building with the -cc option [calls the integrate_ngstar_ccs() function] by providing a comma-separated file containing NG-STAR STs and the associated CCs.There is still not a regular publicly available update of NG-STAR CCs, but the latest run (from May 2023) is available from the pyngoST Github repository, and will be posteriorly further updated and shared with the scientific community.If the database files are manually modified (i.e. by adding new STs or allele sequences), the AC automaton and pickle file can be updated with -u.If NG-STAR CCs were not provided at the time of database building, they can be provided when updating.

Tests of performance
Ten random genome sets of different sizes (25, 50, 100, 200, 400, 800 and 1600 genomes) were obtained from the Euro-GASP 2018 dataset (n=2375 genomes) [1].The performance of pyngoST was evaluated by computing the run time of different combinations of parameters: (1) only retrieving exact allele matches from the database for MLST, NG-MAST, NG-STAR and NG-STAR CCs, (2) blasting new alleles for MLST, NG-MAST and NG-STAR and reporting the closest alleles, (3) computing NG-MAST genogroups from different data set sizes and (4) using multithreading.All tests were run on a MacBook Pro 2.3 GHz Intel Core i7 laptop with four cores, eight threads and 16 GB RAM.

Reference mapping and phylogenetic reconstruction
Fastq files of the 2375 isolates from the Euro-GASP 2018 dataset (ENA project PRJEB34068) were quality-checked using FastQC v0.11.9 [27] and trimmed with Trimmomatic v0.39 [28] with a sliding window size of 4 until the average quality within the window fell below a Phred score of 20.Initial and terminal bases with a Phred score below 25 were removed.Clean fastq files were mapped to the N. gonorrhoeae FA1090 reference genome (NCBI accession NC_002946; 2 153 922 bp) with Snippy v4.6.0 [29], by selecting a minimum coverage of 10 reads and 75 % support for calling variants.Repeats as identified with RepeatMasker v4.0.9 [30] on the reference genome and recombination events as detected by Gubbins v3.2.1 [31] were masked from the alignment.ClipKIT v1.3.0 [32] was used with the smart-gap approach to remove uninformative sites caused by missing data due to the absence of mapping and the masking step.A clean alignment resulting from mapping and masking was used to reconstruct a maximumlikelihood phylogenetic tree with IQ-TREE v2.0.6 [33] with the ultrafast bootstrap option with 1000 replicates.Pairwise SNPs in the resulting non-recombining genome aligment were computed using SNP-dists 0.8.2 [34].

Representation of the gonococcal population structure
Genome assemblies from 2375 isolates from the European Gonococcal Antimicrobial Surveillance Programme (Euro-GASP) 2018 WGS study were downloaded from Pathogenwatch (https://pathogen.watch/collection/eurogasp2018)[1,2].pyngoST was run on the assemblies to obtain MLST STs, NG-MAST STs, NG-MAST genogroups, NG-STAR STs, NG-STAR CCs and types of penA mosaicism.Consistency and retention indexes (CI and RI) were calculated using the phangorn v2.7.1 R package [35] to estimate the fitness of each typing scheme to the maximum-likelihood phylogenetic tree under a parsimony model.The CI measures the relative homoplasy of the data and the RI measures monophyly.Values of 1 represent a perfect fit.
FastBAPS [36] was run on the masked alignment obtained from mapping the Euro-GASP 2018 data as detailed above and using the reconstructed recombination-free tree as prior to find the best partition at two hierarchical levels under the Dirichlet Process Mixture model.These partitions were considered as the 'true' classifications of the genomes into groups.We evaluated how close the classification obtained from each of the typing schemes were from these 'true' classifications obtained by FastBAPS using multiple correspondence analysis (MCA), which was identified as the most appropriate statistical test to compare different profiles of categorical variables.The FactoMineR [37] and factoextra R [38] packages were used to perform the MCA and visualize results.The CI and RI were also computed from the obtained partitions.

RESULTS AND DISCUSSION
N. gonorrhoeae is of the top WHO priority pathogens due AMR [39,40] and the results of epidemiological surveillance that use either molecular typing or genomic data are constantly being published, justifying the need for tools that rapidly and accurately integrate both types of data.Here, we present pyngoST (Fig. 1), a command-line-based Python tool that unifies molecular typing for N. gonorrhoeae, for which several typing schemes have been used to discriminate among lineages since before the genomic era.Initially, typing relied on time-consuming amplification-based protocols followed by capillary sequencing of a panel of housekeeping (MLST) or highly variable genes (NG-MAST).More recently, sequencing of a panel of genes associated with AMR (NG-STAR) was shown to represent main genomic lineages similarly to MLST but with the benefit of additionally representing AMR profiles.Modifications of NG-STAR and NG-MAST have also been used in the form of NG-STAR CCs and NG-MAST genogroups, respectively.With the release of pyngoST, we provide a framework for the unification of these five typing schemes that can be used in a fast, memory-efficient and accurate manner on large collections of genome assemblies.pyngoST can also output the type of penA mosaicism (mosaic, semi-mosaic or non-mosaic) as well as define STs and identify NG-STAR CCs from tables containing allelic profiles, being also a useful tool for integrating typing of N. gonorrhoeae in laboratories without access to WGS and high-performance computing.

Fast, simultaneous and accurate molecular typing from genome assemblies
pyngoST was conceived to be fast when run even in large collections of genome assemblies.It uses a simultaneous multi-string searching algorithm for exact matching of a database of allelic sequences in direct and reverse orientation.The database used at the time of preparing this paper contained 53 372 sequences representing the forward and reverse sequences of all the alleles in the seven MLST, two NG-MAST and seven NG-STAR loci.We ran pyngoST on the 2375 genomes of the Euro-GASP 2018 genomic survey [1] on a laptop computer and it took 3.84 min to accurately report the profiles and STs of the three schemes plus the NG-STAR CCs using exact matches to existing alleles and STs in the database.Using the -b option to obtain the closest known alleles to new alleles using BLASTn on the whole dataset pyngoST took 11.87 min.However, in this mode, the run time will be dependent on the number of new alleles in the dataset.Ten random genome sets of different sizes (25, 50, 100, 200, 400, 800 and 1600 genomes) from the same study [1] were used to evaluate the performance of pyngoST in terms of speed, and a linear increase in run time with an increasing number of genomes was observed.Computation time ranged from a minimum of 3.2 s for a 25-genome dataset to 2.3 min for a 1600-genome dataset.Requesting the closest known alleles to new sequences (-b) resulted in a minimum computation time of 15.7 s in a dataset of 25 genomes and a maximum computation time of 18.8 min in a dataset of 1600 genomes.If different isolates have identical alleles that have not been identified before, they are assigned the same number.Obtaining the closest known alleles is a more time-consuming task than only reporting exact alleles and can be parallelized using multithreading, which reduces run time proportionally to the number of threads requested.To further streamline the submission process for the user, these new sequences can be exported to a fasta file using the -a option, making it easier to submit to the hosting web server for assignment.NG-MAST genogroups can also be requested from the NG-MAST results (-g option).This task is performed after typing and took 31.8min for the complete dataset of 2375 genomes.For different dataset sizes, run time increased linearly with size, with an average of 4.1 min for 25 genomes to an average of 16.5 min for 1600 genomes.

NG-STAR best represents gonococcal population structure
Previous comparisons of the distribution of gonococcal STs or CCs obtained from molecular typing on gonococcal phylogenomic trees have shown reasonable concordance values [9,12].In this study, using a collection of 2375 genomes from the Euro-GASP 2018 genomic survey, we observed that the scheme that best fitted the phylogenomic tree topology (Fig. 2a) was NG-STAR (RI=0.958,CI=0.846), followed by MLST (RI=0.964,CI=0.695),NG-STAR CCs (RI=0.947,CI=0.485),NG-MAST (RI=0.805,CI=0.622) and NG-MAST genogroups (RI=0.839,CI=0.430).Singleton STs (those appearing only once in a dataset), however, inflate these estimates, as they can fit on any topology, and this dataset contains 39 % of singletons for MLST (64 out of 164) and over 50 % of singletons for NG-STAR (214 out of 418 STs) and NG-MAST (245 out of 483).Better estimates have been previously described for the different schemes [9,12]; however, several factors may affect these calculations, such as the composition of the dataset.Low CIs were obtained from NG-STAR CCs and NG-MAST genogroups, indicating high homoplasy of these classification schemes in the tree.This can be expected as they group different NG-STAR STs and NG-MAST STs, respectively, following specific rules on the target loci.Genomic variability in the rest of the core genome due to mutation or recombination often splits isolates of the same ST in different genomic lineages of the phylogenetic tree.This fact is probably aggravated for NG-STAR CCs and NG-MAST genogroups.Nonetheless, the RI, which estimates monophyly of the classification on the tree, was very high for NG-STAR CCs, NG-STAR STs and MLST and this is reflected on the mapping over the phylogenomic tree (Fig. 2a).As expected, the highest fitness to the tree was obtained for the FastBAPS partitions (first level: RI=0.997,CI=0.821; second level: RI=0.996,CI=0.925), which were obtained from the genome alignment and using the phylogenetic tree as prior.These results re-confirm that WGS provides the most accurate representation of the population structure of an organism compared to existing typing methods.Supporting our results, on  [1] as calculated from SNPs in the non-recombining core genome [21] and represented in Microreact [3].The scale bar represents the number of nucleotide substitutions per site.Coloured strips next to the tree represent, in the following order, different N. gonorrhoeae sequence typing for antimicrobial resistance (NG-STAR) sequence types (STs) [12], NG-STAR clonal complexes (NG-STAR CCs) [8], the type of penA mosaicisim, multi-locus sequence typing (MLST) STs [6,22], N. gonorrhoeae multi-antigen sequence typing (NG-MAST v2) STs [7,22] and NG-MAST genogroups [9,10], as obtained through pyngoST.The block on the right represents two partitioning levels obtained from FastBAPS [36].Colours were automatically assigned by Microreact.The top left graph represents the first two dimensions of a multiple correspondence analysis comparing the different typing schemes with the two hierarchichal levels obtained by FastBAPS.(b) Number of pairwise SNPs in the core genome of the top ten lineages of each of the typing schemes: NG-STAR, NG-STAR CCs, MLST, NG-MAST v2 and NG-MAST genogroups, and the two partitioning levels from FastBAPS.The colour of each lineage matches that in the coloured strips in (a).The tree and typing information obtained by pyngoST from this dataset can be explored using Microreact (https://microreact.org/project/wYpBzCs9A6Uf7HEMA6zmmY-eurogasp2018-pyngost).
a multiple correspondence analysis, we that NG-STAR and NG-STAR CCs the typing schemes that provide the closest classification to the FastBAPS partitions (Fig. 2a) and, thus, to the population structure obtained from genome data.
Finally, we evaluated the distribution of pairwise differences among the non-recombining core genome of isolates from the same lineage (ST, CCs and genogroups) independently of the phylogeny.The SNP distance in the non-recombining core genome of all the isolates in the dataset ranged from 0 to 3120.The distribution of pairwise SNPs among isolates from the top lineages of the five schemes clearly revealed the existence of genomic sublineages within the same ST, CC or genogroup (Fig. 2b).At a short scale, 95.9 % of isolates of the same NG-MAST STs, followed by 90.4 % of isolates of the same NG-STAR ST and 81.5 % of isolates of the same MLST ST, differed in fewer than 100 SNPs between each other.In NG-MAST genogroups and NG-STAR CCs, which both group different STs, these proportions were 88.1 and 57.6 %, respectively.At a higher scale, 99.4 % of NG-STAR STs (85.9 % of CCs) are formed by isolates with fewer than 1000 SNPs between each other, followed by 97.4 % of NG-MAST STs (96.4 % of genogroups) and 92.1 % of MLST STs.

CONCLUSIONS
Traditional molecular typing tools are still widely used for classifying gonococcal isolates into STs for epidemiological and surveillance studies.However, WGS is increasingly being used for these purposes as it provides the maximum discrimination and accuracy in resolution between isolates.From these genomes, we can directly extract the typing information, avoiding tedious amplification and sequencing reactions and contributing to the integration and comparability of results from epidemiological and surveillance studies that use WGS with those that use traditional molecular typing.pyngoST is a fast command-line Python tool for rapid and accurate multiple sequence typing of N. gonorrhoeae genome assemblies using any or all of the main typing schemes available: MLST, NG-MAST and NG-STAR.It can also output NG-STAR CCs for an easier representation of AMR lineages and calculate NG-MAST genogroups for reproducibility of results from previous studies.Genogroups are dependent on the dataset and change over time and, thus, their use is not encouraged, to the benefit of other schemes such as NG-STAR STs and CCs, which are comparable among studies.Finally, we show that NG-STAR is the scheme that best represents the genomic population structure of N. gonorrhoeae, highlighting the importance of antimicrobial use and AMR on the evolution of this sexually transmitted pathogen.

Fig.
Fig. Representation of the Neisseria gonorrhoeae genomic population (a) Maximum-likelihood phylogenetic tree of the Euro-GASP 2018 genomic dataset[1] as calculated from SNPs in the non-recombining core genome[21] and represented in Microreact[3].The scale bar represents the number of nucleotide substitutions per site.Coloured strips next to the tree represent, in the following order, different N. gonorrhoeae sequence typing for antimicrobial resistance (NG-STAR) sequence types (STs)[12], NG-STAR clonal complexes (NG-STAR CCs)[8], the type of penA mosaicisim, multi-locus sequence typing (MLST) STs[6,22], N. gonorrhoeae multi-antigen sequence typing (NG-MAST v2) STs[7,22] and NG-MAST genogroups[9,10], as obtained through pyngoST.The block on the right represents two partitioning levels obtained from FastBAPS[36].Colours were automatically assigned by Microreact.The top left graph represents the first two dimensions of a multiple correspondence analysis comparing the different typing schemes with the two hierarchichal levels obtained by FastBAPS.(b) Number of pairwise SNPs in the core genome of the top ten lineages of each of the typing schemes: NG-STAR, NG-STAR CCs, MLST, NG-MAST v2 and NG-MAST genogroups, and the two partitioning levels from FastBAPS.The colour of each lineage matches that in the coloured strips in (a).The tree and typing information obtained by pyngoST from this dataset can be explored using Microreact (https://microreact.org/project/wYpBzCs9A6Uf7HEMA6zmmY-eurogasp2018-pyngost).

Table 1 .
Molecular typing schemes implemented in pyngoST

Scheme (source) No. of loci No. of STs* Loci (no. of alleles)* Sequence lengths (bases)
*Number of STs and alleles of each locus according to a database downloaded in August 2023.†Note that the optimized scheme NG-MAST v2 from PubMLST is used by default.MLST, Multi-Locus Sequence Typing; NG-MAST, N. gonorrhoeae Multi-Antigen Sequence Typing; NG-STAR, N. gonorrhoeae Sequence Typing for Antimicrobial Resistance.